LAMP - TR - 129 CS - TR - 4781 UMIACS - TR - 2006 - 06 January 2006 HANDWRITING IDENTIFICATION , MATCHING , AND INDEXING IN NOISY DOCUMENT IMAGES

نویسنده

  • Yefeng Zheng
چکیده

Throughout history, handwriting has been the primary means of recording information that is persevered across both time and space. With the coming of the electronic document era, we are challenged with making an enormous amount of handwritten documents available for electronic access. Though many handwritten documents contain only handwriting, now, more are mixed with printed text, noise, and background patterns. The mixture of handwriting with other components presents a great challenge for making an original document electronically accessible. Many handwritten documents come together with a special background pattern, rule lines, which are printed on the paper to guide writing. After digitization, rule lines will touch text and cause problems for further document image analysis if they are not detected and removed. In this dissertation, we present a rule line detection algorithm based on hidden Markov model (HMM) decoding, achieving both high detection accuracy and a low false alarm rate. After detection, line removal is performed by line width thresholding. Handwriting often mixes with printed text, such as signatures and annotations on a business letter. Handwriting in a printed document often indicates corrections, additions, or other supplemental information that should be treated differently from the main content. The data set we are processing is noisy, which makes the problem more challenging. In this dissertation, we first segment the document at a suitable level, and then classify The support of this research by the US Department of Defense under contract MDA-9040-2C-0406 is gratefully acknowledged.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Handwriting Identification , Matching , and Indexing in Noisy

Title of dissertation: HANDWRITING IDENTIFICATION, MATCHING, AND INDEXING IN NOISY DOCUMENT IMAGES Yefeng Zheng, Doctor of Philosophy, 2005 Dissertation directed by: Professor Rama Chellappa Department of Electrical and Computer Engineering Throughout history, handwriting has been the primary means of recording information that is persevered across both time and space. With the coming of the el...

متن کامل

Concept-based semantic annotation, indexing and retrieval of office-like document units

We present an ontology-driven approach to semantic annotation, indexing and retrieval of document units. This approach is based on a novel semantic document model (SDM) that we developed to make office-like document units be uniquely identified, semantically annotated with concepts from annotation ontologies and linkable across document boundaries. In the semantic annotation model that we propo...

متن کامل

Handwriting identification, matching, and indexing in noisy document images

Throughout history, handwriting has been the primary means of recording information that is persevered across both time and space. With the coming of the electronic document era, we are challenged with making an enormous amount of handwritten documents available for electronic access. Though many handwritten documents contain only handwriting, now, more are mixed with printed text, noise, and b...

متن کامل

Indexing of Handwritten Historical Documents - Recent Progress

Indexing and searching collections of handwritten archival documents and manuscripts has always been a challenge because handwriting recognizers do not perform well on such noisy documents. Given a collection of documents written by a single author (or a few authors), one can apply a technique called word spotting. The approach is to cluster word images based on their visual appearance, after s...

متن کامل

CS TR August A Survey of Information Retrieval and Filtering Methods

We survey the major techniques for information retrieval In the rst part we provide an overview of the traditional ones full text scanning inversion signature les and clustering In the second part we discuss attempts to include semantic information natural language processing latent semantic indexing and neural networks This work was partially funded by the National Science Foundation under Gra...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006